Model Selection

Cross-modal translation

# Cross-modal translation

Seamless M4t V2 Large

SeamlessM4T is a large-scale multilingual multimodal machine translation model supporting speech and text translation in nearly 100 languages.

Text-to-Audio Supports Multiple Languages

Vit Roberta Fa Image Captioning Flickr30k

A Persian image captioning model based on ViT+RoBERTa architecture, specifically designed to generate Persian text descriptions from images

Image-to-Text Other

Transformer-based machine translation model for translating American Sign Language (ASE) to English (EN)

Machine Translation

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase